Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch

نویسندگان

Don McAllaster

Larry Gillick

Francesco Scattone

Michael Newman

چکیده

We present a study of data simulated using acoustic models trained on Switchboard data, and then recognized using various Switchboard-trained acoustic models. When we recognize real Switchboard conversations, simple development models give a word error rate (WER) of about 47 percent. If instead we simulate the speech data using word transcriptions of the conversation, obtaining the pronunciations for the words from our recognition dictionary, the WER drops by a factor of five to ten. In a third type of experiment, we use human-generated phonetic transcripts to fabricate data that more realistically represents conversational speech, and obtain WERs in the low 40’s, rates that are fairly similar to those seen in actual speech data. Taken as a whole, these and other experiments we describe in the paper suggest that there is a substantial mismatch between real speech and the combination of our acoustic models and the pronunciations in our recognition dictionary. The use of simulation appears to be a promising tool in our efforts to understand and reduce the size of this mismatch, and may prove to be a generally valuable diagnostic in speech recognition research .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resegmentation of SWITCHBOARD

The SWITCHBOARD (SWB) corpus is one of the most important benchmarks for recognition tasks involving large vocabulary conversational speech (LVCSR). The high error rates on SWB are largely attributable to an acoustic model mismatch, the high frequency of poorly articulated monosyllabic words, and large variations in pronunciations. It is imperative to improve the quality of segmentations and tr...

متن کامل

Joint Uncertainty Decoding for Robust Large Vocabulary Speech Recognition

Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather than considering noise robustness techniques as compensating clean acoustic models for environment...

متن کامل

A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition

In this paper, we propose a robust compensation strategy to deal effectively with extraneous acoustic variations for spontaneous speech recognition. This strategy extends speaker adaptive training, and uses hidden Markov models (HMM) parameter transformations to normalize the extraneous variations in the training data according to a set of predefined conditions. A “compact” model and the associ...

متن کامل

Transcription of Russian conversational speech

This paper presents initial work in transcribing conversational telephone speech in Russian. Acoustic seed models were derived from other languages. The initial studies are carried out with 9 hours of transcribed data, and explore the choice of the phone set and use of other data types to improve transcription performance. Discriminant features produced by a Multi Layer Perceptron trained on a ...

متن کامل

Support vector machines for automatic data cleanup

Accurate training data plays a very important role in training effective acoustic models for speech recognition. In conversational speech, in several cases, the transcribed data has a significant word error rate which leads to bad acoustic models. In this paper we explore a method to automatically identify such mislabelled data in the context of a hybrid Support Vector Machine/hidden Markov mod...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch

نویسندگان

چکیده

منابع مشابه

Resegmentation of SWITCHBOARD

Joint Uncertainty Decoding for Robust Large Vocabulary Speech Recognition

A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition

Transcription of Russian conversational speech

Support vector machines for automatic data cleanup

عنوان ژورنال:

اشتراک گذاری